lpEdit: an editor to facilitate reproducible analysis via literate programming
نویسندگان
چکیده
There is evidence to suggest that a surprising proportion of published experiments in science are difficult if not impossible to reproduce. The concepts of data sharing, leaving an audit trail and extensive documentation are fundamental to reproducible research, whether it is in the laboratory or as part of an analysis. In this work, we introduce a tool for documentation that aims to make analyses more reproducible in the general scientific community. The application, lpEdit, is a cross-platform editor, written with PyQt4, that enables a broad range of scientists to carry out the analytic component of their work in a reproducible manner—through the use of literate programming. Literate programming mixes code and prose to produce a final report that reads like an article or book. lpEdit targets researchers getting started with statistics or programming, so the hurdles associated with setting up a proper pipeline are kept to a minimum and the learning burden is reduced through the use of templates and documentation. The documentation for lpEdit is centered around learning by example, and accordingly we use several increasingly involved examples to demonstrate the software’s capabilities. We first consider applications of lpEdit to process analyses mixing R and Python code with the LATEX documentation system. Finally, we illustrate the use of lpEdit to conduct a reproducible functional analysis of high-throughput sequencing data, using the transcriptome of the butterfly species Pieris brassicae.
منابع مشابه
Language-Agnostic Reproducible Data Analysis Using Literate Programming
A modern biomedical research project can easily contain hundreds of analysis steps and lack of reproducibility of the analyses has been recognized as a severe issue. While thorough documentation enables reproducibility, the number of analysis programs used can be so large that in reality reproducibility cannot be easily achieved. Literate programming is an approach to present computer programs ...
متن کاملSGML - Lite { An SGML - based Programming Environment
Literate Programming is a documentation method that attempts to maintain consistency among the various design and program documents of a software system. Unfortunately the majority of the literate programming tools do not have appropriate user interfaces and require the users to learn complicated and cryptic tagging languages. SGML is a metalanguage used to specify markup or tagging languages t...
متن کاملThree Issues in the Use of Versioned Hypermedia for Software Development Systems
The Software Concordance project is extending the concept of literate programming with research on how modern document and hypermedia services can improve software development environments. The Software Concordance editor is both a syntax-recognizing Java program editor and an XML document editor. It has a uniform document model, based on XML, that lets Java source code documents include both h...
متن کاملMathModelica An Extensible Modeling and Simulation Environment with Integrated Graphics and Literate Programming
MathModelica is an integrated interactive development environment for advanced system modeling and simulation. The environment integrates Modelica-based modeling and simulation with graphic design, advanced scripting facilities, integration of program code, test cases, graphics, documentation, mathematical type setting, and symbolic formula manipulation provided via Mathematica. The user interf...
متن کاملAn SGML-based programming environment for literate programming
8 it to compile it afterwards. However, programs are text, so they can beneet from recent results in document research. We propose a new underlying structure for literate programs, based on SGML. As a result , we will use SGML tools to view and modify literate programs. These tools will isolate programmers from the low level structure of literate programs. Moreover, literate programming can be ...
متن کامل